pruning process
C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression
Bauvin, Baptiste, Baret, Loïc, Ahmad, Ola
Neural network compression has gained increasing attention in recent years, particularly in computer vision applications, where the need for model reduction is crucial for overcoming deployment constraints. Pruning is a widely used technique that prompts sparsity in model structures, e.g. weights, neurons, and layers, reducing size and inference costs. Structured pruning is especially important as it allows for the removal of entire structures, which further accelerates inference time and reduces memory overhead. However, it can be computationally expensive, requiring iterative retraining and optimization. To overcome this problem, recent methods considered one-shot setting, which applies pruning directly at post-training. Unfortunately, they often lead to a considerable drop in performance. In this paper, we focus on this issue by proposing a novel one-shot pruning framework that relies on explainable deep learning. First, we introduce a causal-aware pruning approach that leverages cause-effect relations between model predictions and structures in a progressive pruning process. It allows us to efficiently reduce the size of the network, ensuring that the removed structures do not deter the performance of the model. Then, through experiments conducted on convolution neural network and vision transformer baselines, pre-trained on classification tasks, we demonstrate that our method consistently achieves substantial reductions in model size, with minimal impact on performance, and without the need for fine-tuning. Overall, our approach outperforms its counterparts, offering the best trade-off. Our code is available on GitHub.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (2 more...)
STADE: Standard Deviation as a Pruning Metric
Mecke, Diego Coello de Portugal, Alyoussef, Haya, Koloiarov, Ilia, Stubbemann, Maximilian, Schmidt-Thieme, Lars
Recently, Large Language Models (LLMs) have become very widespread and are used to solve a wide variety of tasks. To successfully handle these tasks, LLMs require longer training times and larger model sizes. This makes LLMs ideal candidates for pruning methods that reduce computational demands while maintaining performance. Previous methods require a retraining phase after pruning to maintain the original model's performance. However, state-of-the-art pruning methods, such as Wanda, prune the model without retraining, making the pruning process faster and more efficient. Building upon Wanda's work, this study provides a theoretical explanation of why the method is effective and leverages these insights to enhance the pruning process. Specifically, a theoretical analysis of the pruning problem reveals a common scenario in Machine Learning where Wanda is the optimal pruning method. Furthermore, this analysis is extended to cases where Wanda is no longer optimal, leading to the development of a new method, STADE, based on the standard deviation of the input. From a theoretical standpoint, STADE demonstrates better generality across different scenarios. Finally, extensive experiments on Llama and Open Pre-trained Transformers (OPT) models validate these theoretical findings, showing that depending on the training conditions, Wanda's optimal performance varies as predicted by the theoretical framework. These insights contribute to a more robust understanding of pruning strategies and their practical implications. Code is available at: https://github.com/Coello-dev/STADE/
- Europe > Germany > Lower Saxony (0.05)
- North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
- Europe > Portugal (0.04)
- Europe > Czechia > Prague (0.04)
MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures
Qin, Jiayu, Tan, Jianchao, Zhang, Kefeng, Cai, Xunliang, Wang, Wei
The remarkable performance of large language models (LLMs) in various language tasks has attracted considerable attention. However, the ever-increasing size of these models presents growing challenges for deployment and inference. Structured pruning, an effective model compression technique, is gaining increasing attention due to its ability to enhance inference efficiency. Nevertheless, most previous optimization-based structured pruning methods sacrifice the uniform structure across layers for greater flexibility to maintain performance. The heterogeneous structure hinders the effective utilization of off-the-shelf inference acceleration techniques and impedes efficient configuration for continued training. To address this issue, we propose a novel masking learning paradigm based on minimax optimization to obtain the uniform pruned structure by optimizing the masks under sparsity regularization. Extensive experimental results demonstrate that our method can maintain high performance while ensuring the uniformity of the pruned model structure, thereby outperforming existing SOTA methods.
- North America > United States > Virginia (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
Pruning for Sparse Diffusion Models based on Gradient Flow
Wan, Ben, Zheng, Tianyi, Chen, Zhaoyu, Wang, Yuxiao, Wang, Jia
Diffusion Models (DMs) have impressive capabilities among generation models, but are limited to slower inference speeds and higher computational costs. Previous works utilize one-shot structure pruning to derive lightweight DMs from pre-trained ones, but this approach often leads to a significant drop in generation quality and may result in the removal of crucial weights. Thus we propose a iterative pruning method based on gradient flow, including the gradient flow pruning process and the gradient flow pruning criterion. We employ a progressive soft pruning strategy to maintain the continuity of the mask matrix and guide it along the gradient flow of the energy function based on the pruning criterion in sparse space, thereby avoiding the sudden information loss typically caused by one-shot pruning. Gradient-flow based criterion prune parameters whose removal increases the gradient norm of loss function and can enable fast convergence for a pruned model in iterative pruning stage. Our extensive experiments on widely used datasets demonstrate that our method achieves superior performance in efficiency and consistency with pre-trained models.
- Asia > China > Shanghai > Shanghai (0.06)
- Europe > Switzerland (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Decay Pruning Method: Smooth Pruning With a Self-Rectifying Procedure
Yang, Minghao, Gao, Linlin, Li, Pengyuan, Li, Wenbo, Dong, Yihong, Cui, Zhiying
Deep Neural Networks (DNNs) have been widely used for various applications, such as image classification [22; 40], object segmentation [33; 35], and object detection [6; 43]. However, the increasing size and complexity of DNNs often result in substantial computational and memory requirements, posing challenges for deployment on resource-constrained platforms, such as mobile or embedded devices. Consequently, developing efficient methods to reduce the computational complexity and storage demands of large models, while minimizing performance degradation, has become essential. Network pruning is one of the most popular methods in model compression. Specifically, current network pruning methods are categorized into unstructured and structured pruning [5]. Unstructured pruning [11; 24] focuses on eliminating individual weights from a network to create fine-grained sparsity. Although these approaches achieve an excellent balance between model size reduction and accuracy retention, they often require specific hardware support for acceleration, which is impractical for general-purpose computing environments. Conversely, structured pruning [23; 18; 29] avoids these hardware dependencies by eliminating redundant network structures, thus introducing a more manageable and hardware-compatible form of sparsity. As a result, structured pruning has become popular and is extensively utilized.
LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights
Castells, Thibault, Song, Hyoung-Kyu, Kim, Bo-Kyeong, Choi, Shinkook
Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing LDMs. Traditional pruning methods for deep neural networks are not tailored to the unique characteristics of LDMs, such as the high computational cost of training and the absence of a fast, straightforward and task-agnostic method for evaluating model performance. Our method tackles these challenges by leveraging the latent space during the pruning process, enabling us to effectively quantify the impact of pruning on model performance, independently of the task at hand. This targeted pruning of components with minimal impact on the output allows for faster convergence during training, as the model has less information to re-learn, thereby addressing the high computational cost of training. Consequently, our approach achieves a compressed model that offers improved inference speed and reduced parameter count, while maintaining minimal performance degradation. We demonstrate the effectiveness of our approach on three different tasks: text-to-image (T2I) generation, Unconditional Image Generation (UIG) and Unconditional Audio Generation (UAG). Notably, we reduce the inference time of Stable Diffusion (SD) by 34.9% while simultaneously improving its FID by 5.2% on MS-COCO T2I benchmark. This work paves the way for more efficient pruning methods for LDMs, enhancing their applicability.
In to Decision Trees Part: 2. Hi! Hello and thanks for reading this…
If you missed the previous part of this blog "In to Decision Trees Part: 1", please visit it. In this blog, we will explore more about Decision Trees Algorithm and its capability. The Process of Decision trees on Numerical (Discrete/Continuous) features is slightly different than categorical features. While the Decision trees can handle categorical variables with ease. There are two types of categorical values.
A Fair Loss Function for Network Pruning
Meyer, Robbie, Wong, Alexander
Model pruning can enable the deployment of neural networks in environments with resource constraints. While pruning may have a small effect on the overall performance of the model, it can exacerbate existing biases into the model such that subsets of samples see significantly degraded performance. In this paper, we introduce the performance weighted loss function, a simple modified cross-entropy loss function that can be used to limit the introduction of biases during pruning. Experiments using biased classifiers for facial classification and skin-lesion classification tasks demonstrate that the proposed method is a simple and effective tool that can enable existing pruning methods to be used in fairness sensitive contexts.
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
- North America > United States > New York > New York County > New York City (0.04)
SBPF: Sensitiveness Based Pruning Framework For Convolutional Neural Network On Image Classification
Lu, Yiheng, Gong, Maoguo, Zhao, Wei, Feng, Kaiyuan, Li, Hao
Pruning techniques are used comprehensively to compress convolutional neural networks (CNNs) on image classification. However, the majority of pruning methods require a well pre-trained model to provide useful supporting parameters, such as C1-norm, BatchNorm value and gradient information, which may lead to inconsistency of filter evaluation if the parameters of the pre-trained model are not well optimized. Therefore, we propose a sensitiveness based method to evaluate the importance of each layer from the perspective of inference accuracy by adding extra damage for the original model. Because the performance of the accuracy is determined by the distribution of parameters across all layers rather than individual parameter, the sensitiveness based method will be robust to update of parameters. Namely, we can obtain similar importance evaluation of each convolutional layer between the imperfect-trained and fully trained models. For VGG-16 on CIFAR-10, even when the original model is only trained with 50 epochs, we can get same evaluation of layer importance as the results when the model is trained fully. Then we will remove filters proportional from each layer by the quantified sensitiveness. Our sensitiveness based pruning framework is verified efficiently on VGG-16, a customized Conv-4 and ResNet-18 with CIFAR-10, MNIST and CIFAR-100, respectively.
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- Europe > Italy > Veneto > Venice (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (15 more...)
Adaptive Activation-based Structured Pruning
Zhao, Kaiqi, Jain, Animesh, Zhao, Ming
Pruning is a promising approach to compress complex deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yield models that cannot efficiently run on commodity hardware, and require users to manually explore and tune the pruning process, which is time consuming and often leads to sub-optimal results. To address these limitations, this paper presents an adaptive, activation-based, structured pruning approach to automatically and efficiently generate small, accurate, and hardware-efficient models that meet user requirements. First, it proposes iterative structured pruning using activation-based attention feature maps to effectively identify and prune unimportant filters. Then, it proposes adaptive pruning policies for automatically meeting the pruning objectives of accuracy-critical, memory-constrained, and latency-sensitive tasks. A comprehensive evaluation shows that the proposed method can substantially outperform the state-of-the-art structured pruning works on CIFAR-10 and ImageNet datasets. For example, on ResNet-56 with CIFAR-10, without any accuracy drop, our method achieves the largest parameter reduction (79.11%), outperforming the related works by 22.81% to 66.07%, and the largest FLOPs reduction (70.13%), outperforming the related works by 14.13% to 26.53%. Deep neural networks (DNNs) have substantial compute and memory requirements.
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)